{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Molecule MolGraph featurizers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from chemprop.featurizers.molgraph.molecule import SimpleMoleculeMolGraphFeaturizer"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is an example molecule to featurize."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from rdkit import Chem\n",
    "\n",
    "mol_to_featurize = Chem.MolFromSmiles(\"CC\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Simple molgraph featurizer"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A `MolGraph` represents the graph featurization of a molecule. It is made of atom features (`V`), bond features (`E`), and a mapping between atoms and bonds (`edge_index` and `rev_edge_index`). It is created by `SimpleMoleculeMolGraphFeaturizer`. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "MolGraph(V=array([[0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     ,\n",
       "        0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,\n",
       "        0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,\n",
       "        0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,\n",
       "        0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,\n",
       "        0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,\n",
       "        1.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,\n",
       "        1.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     ,\n",
       "        0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     ,\n",
       "        0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     ,\n",
       "        0.     , 0.12011],\n",
       "       [0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     ,\n",
       "        0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,\n",
       "        0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,\n",
       "        0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,\n",
       "        0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,\n",
       "        0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,\n",
       "        1.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,\n",
       "        1.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     ,\n",
       "        0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     ,\n",
       "        0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     ,\n",
       "        0.     , 0.12011]], dtype=float32), E=array([[0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],\n",
       "       [0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.]]), edge_index=array([[0, 1],\n",
       "       [1, 0]]), rev_edge_index=array([1, 0]))"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "featurizer = SimpleMoleculeMolGraphFeaturizer()\n",
    "featurizer(mol_to_featurize)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### CuikmolmakerMolGraphFeaturizer (available with `cuik-molmaker` only)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This featurizer turns a batch of SMILES strings into a featurized batched graph. It relies on the `cuik-molmaker` package to (1) convert SMILES strings into `Chem.Mol` objects, (2) calculate the features of atoms and bonds, and (3) batch all the individual graphs together. \n",
    "\n",
    "When creating the featurizer, you can give it options to control step 1 (currently only `add_h`). For step 2, the featurizer takes `atom_featurizer_mode` to determine which atom and bond features to calculate, instead of separate atom and bond featurizers.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<chemprop.featurizers.molgraph.molecule.BatchCuikMolGraph at 0x706e1b865760>"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from chemprop.featurizers.molgraph.molecule import CuikmolmakerMolGraphFeaturizer\n",
    "from chemprop.utils.utils import is_cuikmolmaker_available\n",
    "\n",
    "if is_cuikmolmaker_available():\n",
    "    featurizer = CuikmolmakerMolGraphFeaturizer(atom_featurizer_mode=\"organic\", add_h=True)\n",
    "    bmg = featurizer([\"C\" * i for i in range(1, 20)])\n",
    "    print(bmg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Custom"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The [atom](./atom_featurizers.ipynb) and [bond](./bond_featurizers.ipynb) featurizers used by the molgraph featurizer are customizable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "MolGraph(V=array([[0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     ,\n",
       "        0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,\n",
       "        0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     ,\n",
       "        0.     , 0.     , 0.     , 1.     , 0.     , 1.     , 0.     ,\n",
       "        0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 1.     ,\n",
       "        0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     ,\n",
       "        0.     , 0.12011],\n",
       "       [0.     , 0.     , 1.     , 0.     , 0.     , 0.     , 0.     ,\n",
       "        0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 0.     ,\n",
       "        0.     , 0.     , 0.     , 1.     , 0.     , 0.     , 0.     ,\n",
       "        0.     , 0.     , 0.     , 1.     , 0.     , 1.     , 0.     ,\n",
       "        0.     , 0.     , 0.     , 0.     , 0.     , 0.     , 1.     ,\n",
       "        0.     , 0.     , 0.     , 0.     , 0.     , 1.     , 0.     ,\n",
       "        0.     , 0.12011]], dtype=float32), E=array([[0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],\n",
       "       [0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.]]), edge_index=array([[0, 1],\n",
       "       [1, 0]]), rev_edge_index=array([1, 0]))"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from chemprop.featurizers import MultiHotAtomFeaturizer, MultiHotBondFeaturizer\n",
    "\n",
    "atom_featurizer = MultiHotAtomFeaturizer.organic()\n",
    "bond_featurizer = MultiHotBondFeaturizer(stereos=[0, 1, 2, 3, 4])\n",
    "featurizer = SimpleMoleculeMolGraphFeaturizer(\n",
    "    atom_featurizer=atom_featurizer, bond_featurizer=bond_featurizer\n",
    ")\n",
    "featurizer(mol_to_featurize)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Extra atom and bond features"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If your [datapoints](../data/datapoints.ipynb) have extra atom or bond features, the molgraph featurizer needs to know the length of the extra features when it is created so that the empty `Chem.Mol` (`Chem.MolFromSmiles(\"\")`) is featurized correctly and so that the bond feature array is the correct shape."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "n_extra_atom_features = 3\n",
    "n_extra_bond_features = 4\n",
    "featurizer = SimpleMoleculeMolGraphFeaturizer(\n",
    "    extra_atom_fdim=n_extra_atom_features, extra_bond_fdim=n_extra_bond_features\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The [dataset](../data/datasets.ipynb) is given this custom featurizer and automatically handles the featurization including passing extra atom and bond features for each datapoint. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "chemprop",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
